Skip to content

feat(rivetkit-core): expose metrics endpoint#5013

Draft
NathanFlurry wants to merge 1 commit into
05-09-wip_counter_latency_monitorfrom
counter-latency/core-metrics-endpoint
Draft

feat(rivetkit-core): expose metrics endpoint#5013
NathanFlurry wants to merge 1 commit into
05-09-wip_counter_latency_monitorfrom
counter-latency/core-metrics-endpoint

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 11, 2026

🚅 Deployed to the rivet-pr-5013 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web May 14, 2026 at 5:27 pm
website 😴 Sleeping (View Logs) Web May 12, 2026 at 2:37 am
mcp-hub ✅ Success (View Logs) Web May 11, 2026 at 12:07 pm
frontend-cloud ❌ Build Failed (View Logs) Web May 11, 2026 at 12:06 pm
ladle ❌ Build Failed (View Logs) Web May 11, 2026 at 12:06 pm
frontend-inspector ❌ Build Failed (View Logs) Web May 11, 2026 at 12:06 pm

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 11, 2026

Code Review: feat(rivetkit-core): expose metrics endpoint

Overview

This PR adds a /metrics Prometheus endpoint to rivetkit-core, implemented across two code paths: the registry HTTP dispatcher (registry/http.rs) and the serverless runtime (serverless.rs). Authorization uses constant-time bearer token comparison via subtle::ConstantTimeEq. The endpoint is controlled by two env vars: RIVETKIT_METRICS_ENABLED=1 and RIVETKIT_METRICS_TOKEN.


Issues

1. Dead code in handle_framework_fetch match arm

The early return in handle_fetch before active_actor() means the FrameworkHttpRoute::Metrics arm inside handle_framework_fetch is unreachable:

// handle_fetch: always returns early for Metrics
if matches!(route, RegistryHttpRoute::Framework(FrameworkHttpRoute::Metrics)) {
    return handle_metrics_fetch(&request);
}
let instance = match self.active_actor(actor_id).await { ... };
// ...
// handle_framework_fetch: this arm is dead code
FrameworkHttpRoute::Metrics => handle_metrics_fetch(&request),

The intent (serving metrics without requiring an active actor) is correct, but the unreachable match arm is confusing. Consider removing Metrics from the handle_framework_fetch match and using _ => unreachable!() there (acceptable per CLAUDE.md for explicit asserts), or restructure to remove the early return and add a dedicated pre-actor check in the Err branch instead.

2. Dual env-var design is a footgun

Requiring both RIVETKIT_METRICS_ENABLED=1 AND RIVETKIT_METRICS_TOKEN is awkward. If a user sets the token but forgets the enabled flag, they get a silent 403 with no diagnostic. Presence of a non-empty RIVETKIT_METRICS_TOKEN alone should be sufficient to enable the endpoint — drop RIVETKIT_METRICS_ENABLED.

3. Inconsistent error format for Prometheus render failures (serverless path)

In serverless.rs, when render_prometheus_metrics() fails, the code calls error_response(error) which returns a structured JSON body. Prometheus scrapers expect text/plain — a JSON body causes confusing parse errors. The http.rs path correctly propagates the error with ?. The serverless path should either return a plain-text error body or propagate upward with ? as well.

4. Env vars read on every request

configured_metrics_token() reads env vars on every scrape. This means a token change takes effect without restart, which is unexpected behavior for a security credential. Consider reading once at startup and caching the result, or document this behavior explicitly.


Suggestions

  • Magic number 6 in bearer_token_from_authorization: use "bearer".len() instead of the bare literal.
  • Linear scan in authorization_bearer_token_map: headers.iter().find(|(name, _)| name.eq_ignore_ascii_case(...)) is O(n). If callers normalize header keys to lowercase, .get("authorization") is O(1).
  • Missing WWW-Authenticate header on 401: RFC 9110 §11.3 requires it. Consider WWW-Authenticate: Bearer realm="metrics".
  • 403 vs 404 for disabled endpoint: Returning 403 "metrics not enabled" reveals the endpoint exists but is disabled. 404 would be less informative to scanners. Either is defensible — worth a deliberate choice.

Positives

  • Constant-time token comparison via subtle::ConstantTimeEq is the right security call.
  • Method validation (GET-only) happens before auth, returning 405 cleanly.
  • Early return before actor resolution correctly makes metrics available when no actor is active.
  • Module visibility (pub(crate)) is appropriate.
  • Bearer token parsing handles edge cases well (empty token, case-insensitive scheme, leading/extra whitespace).

Missing

No tests added. The bearer token parser (bearer_token_from_authorization) and authorization logic have enough edge cases (empty token, case variants, whitespace) to warrant unit tests in tests/ per the CLAUDE.md Rust test layout rule. The handler paths in both http.rs and serverless.rs should also have integration-level test coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant